{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "978146e2",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/openvino.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f717d3d4-942b-4d86-9435-fc44b3ac6d39",
   "metadata": {},
   "source": [
    "# Optimum Intel LLMs optimized with IPEX backend\n",
    "\n",
    "[Optimum Intel](https://github.com/rbrugaro/optimum-intel) accelerates Hugging Face pipelines on Intel architectures leveraging [Intel Extension for Pytorch, (IPEX)](https://github.com/intel/intel-extension-for-pytorch) optimizations\n",
    "\n",
    "Optimum Intel models can be run locally through `OptimumIntelLLM` entitiy wrapped by LlamaIndex :"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90cf0f2e-8d8d-4e42-81bf-866c759221e1",
   "metadata": {},
   "source": [
    "In the below line, we install the packages necessary for this demo:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f413f179",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-optimum-intel"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dac8f9f-7136-43f7-9e9f-de679e74d66e",
   "metadata": {},
   "source": [
    "Now that we're set up, let's play around:"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2c577674",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86028752",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0465029c-fe69-454a-9561-55f7a382b2e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.optimum_intel import OptimumIntelLLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49122583",
   "metadata": {},
   "outputs": [],
   "source": [
    "def messages_to_prompt(messages):\n",
    "    prompt = \"\"\n",
    "    for message in messages:\n",
    "        if message.role == \"system\":\n",
    "            prompt += f\"<|system|>\\n{message.content}</s>\\n\"\n",
    "        elif message.role == \"user\":\n",
    "            prompt += f\"<|user|>\\n{message.content}</s>\\n\"\n",
    "        elif message.role == \"assistant\":\n",
    "            prompt += f\"<|assistant|>\\n{message.content}</s>\\n\"\n",
    "\n",
    "    # ensure we start with a system prompt, insert blank if needed\n",
    "    if not prompt.startswith(\"<|system|>\\n\"):\n",
    "        prompt = \"<|system|>\\n</s>\\n\" + prompt\n",
    "\n",
    "    # add final assistant prompt\n",
    "    prompt = prompt + \"<|assistant|>\\n\"\n",
    "\n",
    "    return prompt\n",
    "\n",
    "\n",
    "def completion_to_prompt(completion):\n",
    "    return f\"<|system|>\\n</s>\\n<|user|>\\n{completion}</s>\\n<|assistant|>\\n\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3e21cef-b3c3-4ddd-a70c-728de440648e",
   "metadata": {},
   "source": [
    "### Model Loading\n",
    "\n",
    "Models can be loaded by specifying the model parameters using the `OptimumIntelLLM` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a27feba3-d027-4d10-b1af-1e130e764a67",
   "metadata": {},
   "outputs": [],
   "source": [
    "oi_llm = OptimumIntelLLM(\n",
    "    model_name=\"Intel/neural-chat-7b-v3-3\",\n",
    "    tokenizer_name=\"Intel/neural-chat-7b-v3-3\",\n",
    "    context_window=3900,\n",
    "    max_new_tokens=256,\n",
    "    generate_kwargs={\"temperature\": 0.7, \"top_k\": 50, \"top_p\": 0.95},\n",
    "    messages_to_prompt=messages_to_prompt,\n",
    "    completion_to_prompt=completion_to_prompt,\n",
    "    device_map=\"cpu\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e25c7162",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = oi_llm.complete(\"What is the meaning of life?\")\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dda1be10",
   "metadata": {},
   "source": [
    "### Streaming\n",
    "\n",
    "Using `stream_complete` endpoint "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12e0f3c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = oi_llm.stream_complete(\"Who is Mother Teresa?\")\n",
    "for r in response:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c87c383",
   "metadata": {},
   "source": [
    "Using `stream_chat` endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2db801a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "\n",
    "messages = [\n",
    "    ChatMessage(\n",
    "        role=\"system\",\n",
    "        content=\"You are an American chef in a small restaurant in New Orleans\",\n",
    "    ),\n",
    "    ChatMessage(role=\"user\", content=\"What is your dish of the day?\"),\n",
    "]\n",
    "resp = oi_llm.stream_chat(messages)\n",
    "\n",
    "for r in resp:\n",
    "    print(r.delta, end=\"\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
